我们通过将其基于实现功能空间而不是参数空间的几何形状来系统地研究深度神经网络景观的方法。将分类器分组到等效类中,我们开发了一个标准化的参数化,其中所有对称性都被删除,从而导致环形拓扑。在这个空间上,我们探讨了误差景观而不是损失。这使我们能够得出有意义的概念,即最小化器的平坦度和连接它们的地球通道的概念。使用不同的优化算法,这些算法采样具有不同平坦度的最小化器,我们研究模式连接性和相对距离。测试各种最先进的体系结构和基准数据集,我们确认了平面度和泛化性能之间的相关性;我们进一步表明,在功能空间中,minima彼此更近,并且连接它们的大地测量学的屏障很小。我们还发现,通过梯度下降的变体发现的最小化器可以通过由参数空间中的两个直线组成的零误差路径连接,即带有单个弯曲的多边形链。我们观察到具有二进制权重和激活的神经网络中相似的定性结果,这为在这种情况下的连通性提供了第一个结果之一。我们的结果取决于对称性的去除,并且与对简单浅层模型进行的一些分析研究所描述的丰富现象学非常吻合。
translated by 谷歌翻译
我们将数字化量子退火(QA)和量子近似优化算法(QAOA)应用于人工神经网络中监督学习的范式任务:二元切割的突触权优化。在与MaxCut常用的Qoaa应用程序方差,或对Quantum Spin-Chains接地状态准备,经典Hamiltonian的特征在于高度非局部多自旋相互作用。然而,我们为QAOA参数提供最佳顺利解决的证据,这些参数可在同一问题的典型实例之间转移,并且我们证明了Qaoa在传统Qa上的增强性能。我们还研究了QAOA优化景观几何形状在这个问题中的作用,表明QA中遇到的间隙闭合转变的不利影响也对我们实施QAOA实施的表现负面影响。
translated by 谷歌翻译
当前的深度神经网络被高度参数化(多达数十亿个连接权重)和非线性。然而,它们几乎可以通过梯度下降算法的变体完美地拟合数据,并达到预测准确性的意外水平,而不会过度拟合。这些是巨大的结果,无视统计学习的预测,并对非凸优化构成概念性挑战。在本文中,我们使用来自无序系统的统计物理学的方法来分析非凸二进制二进制神经网络模型中过度参数化的计算后果,该模型对从结构上更简单但“隐藏”网络产生的数据进行了培训。随着连接权重的增加,我们遵循误差损失函数不同最小值的几何结构的变化,并将其与学习和概括性能相关联。当解决方案开始存在时,第一次过渡发生在所谓的插值点(完美拟合变得可能)。这种过渡反映了典型溶液的特性,但是它是尖锐的最小值,难以采样。差距后,发生了第二个过渡,并具有不同类型的“非典型”结构的不连续外观:重量空间的宽区域,这些区域特别是解决方案密度且具有良好的泛化特性。两种解决方案共存,典型的解决方案的呈指数数量,但是从经验上讲,我们发现有效的算法采样了非典型,稀有的算法。这表明非典型相变是学习的相关阶段。与该理论建议的可观察到的现实网络的数值测试结果与这种情况一致。
translated by 谷歌翻译
深度学习的成功揭示了神经网络对整个科学的应用潜力,并开辟了基本的理论问题。特别地,基于梯度方法的简单变体的学习算法能够找到高度非凸损函数的近最佳最佳最小值,是神经网络的意外特征。此外,这种算法即使在存在噪声的情况下也能够适合数据,但它们具有出色的预测能力。若干经验结果表明了通过算法实现的最小值的所谓平坦度与概括性性能之间的可再现相关性。同时,统计物理结果表明,在非透露网络中,多个窄的最小值可能与较少数量的宽扁平最小值共存,这概括了很好。在这里,我们表明,从“高边缘”(即局部稳健的)配置,从最小值的聚结会出现宽平坦的结构。尽管与零保证金相比具有呈指数稀有的稀有性,但高利润最小值倾向于集中在特定地区。这些最小值又被较小且较小的边距的其他解决方案包围,导致长距离的溶液区域密集。我们的分析还提供了一种替代分析方法,用于估计扁平最小值,当算法开始找到解决方案时,随着模型参数的数量变化。
translated by 谷歌翻译
在神经网络的经验风险景观中扁平最小值的性质已经讨论了一段时间。越来越多的证据表明他们对尖锐物质具有更好的泛化能力。首先,我们讨论高斯混合分类模型,并分析显示存在贝叶斯最佳点估算器,其对应于属于宽平区域的最小值。可以通过直接在分类器(通常是独立的)或学习中使用的可分解损耗函数上应用最大平坦度算法来找到这些估计器。接下来,我们通过广泛的数值验证将分析扩展到深度学习场景。使用两种算法,熵-SGD和复制-SGD,明确地包括在优化目标中,所谓的非局部平整度措施称为本地熵,我们一直提高常见架构的泛化误差(例如Resnet,CeffectnNet)。易于计算的平坦度测量显示与测试精度明确的相关性。
translated by 谷歌翻译
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
translated by 谷歌翻译
In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.
translated by 谷歌翻译
Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, each algorithm incrementally leverages quantum parallelism to reduce the complexity of the cluster assignment step up to a constant cost. In particular, we exploit quantum phenomena to speed up the computation of distances. The core idea is that the computation of distances between records and centroids can be executed simultaneously, thus saving time, especially for big datasets. We show that our hybrid quantum k-Means algorithms can be more efficient than the classical version, still obtaining comparable clustering results.
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
translated by 谷歌翻译
A significant drawback of eXplainable Artificial Intelligence (XAI) approaches is the assumption of feature independence. This paper focuses on integrating causal knowledge in XAI methods to increase trust and help users assess explanations' quality. We propose a novel extension to a widely used local and model-agnostic explainer that explicitly encodes causal relationships in the data generated around the input instance to explain. Extensive experiments show that our method achieves superior performance comparing the initial one for both the fidelity in mimicking the black-box and the stability of the explanations.
translated by 谷歌翻译